[https://nvbugs/6330273][fix] In StorageManager.__init__, when typical_batch is supplied, append a synthetic…#15465
Conversation
…id windowed-pool deadlock When KVCacheManagerV2 is built with a typical_batch describing the working set (e.g., max_batch_size concurrent decode requests with capacity=max_seq_len), windowed pool groups whose window_size is smaller than tokens_per_block previously collapsed to min_slots=1 because get_stale_range() consumed all but one block per request, and _compute_min_slots_from_constraints() only enforced an absolute floor of 1 slot per pool group. With more than 1 concurrent decode request, the V2 scheduler could not find a free slot in windowed pools and deadlocked. Synthesize a constraint from the typical_batch: one KVCacheDesc(capacity=tokens_per_block, history_length=tokens_per_block-1) per request. For every pool group, this yields non_stale=1 per request, so the new floor is len(typical_batch.kv_caches) — large enough to support the scheduler's full concurrency. Signed-off-by: tensorrt-cicd <90828364+tensorrt-cicd@users.noreply.github.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughIn ChangesStorageManager min-slots constraint synthesis
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related issues
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
Similar fix is already included in #15462 |
Summary
Test plan
Links
Summary by CodeRabbit